Clustering Posts in Online Discussion Forum Threads

نویسندگان

  • Dina Said
  • Nayer Wanas
چکیده

Online discussion forums are considered a challenging repository for data mining tasks. Forums usually contain hundreds of threads which in turn consist of hundreds, or even thousands, of posts. Clustering posts can be used to discover outlier and off-topic posts and would provide better visualization and exploration of online threads.In this paper, we propose the Leader-based Post Clustering (LPC), a modification to the Leader algorithm to be applied to the domain of clustering posts in threads of discussion boards. We also suggest using asymmetric pair-wise distances to measure the dissimilarity between posts. We further investigate the effect of indirect distance between posts, and how to calibrate it with the direct distance. In order to evaluate the proposed methods, we conduct experiments using artificial and real threads extracted from Slashdot and Ciao discussion forums. Experimental results demonstrate the effectiveness of the LPC algorithm when using the linear combination of direct and indirect distances, as well as using an averaging approach to evaluate a representative indirect distance. Furthermore, the results show the potential of the LPC algorithm for detecting off-topic or outlier posts compared with two state-of-the-art methods for off-topic post detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Helpful Posts in Open-Ended Discussion Forums: A Neural Architecture

Users participate in online discussion forums to learn from others and share their knowledge with the community. They often start a thread with a question or sharing their new findings on a certain topic. We find that, unlike Community Question Answering, where questions are mostly factoid based, the threads in a forum are often open-ended (e.g., asking for recommendations from others) without ...

متن کامل

A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts

Threaded discussion forums provide an important social media platform. Its rich user generated content has served as an important source of public feedback. To automatically discover the viewpoints or stances on hot issues from forum threads is an important and useful task. In this paper, we propose a novel latent variable model for viewpoint discovery from threaded forum posts. Our model is a ...

متن کامل

Predicting Thread Discourse Structure over Technical Web Forums

Online discussion forums are a valuable means for users to resolve specific information needs, both interactively for the participants and statically for users who search/browse over historical thread data. However, the complex structure of forum threads can make it difficult for users to extract relevant information. The discourse structure of web forum threads, in the form of labelled depende...

متن کامل

Negative emotions boost user activity at BBC forum

We present an empirical study of user activity in online BBC discussion forums, measured by the number of posts written by individual debaters and the average sentiment of these posts. Nearly 2.5 million posts from over 18 thousand users were investigated. Scale-free distributions were observed for activity in individual discussion threads as well as for overall activity. The number of unique u...

متن کامل

Negative emotions boost users activity at BBC Forum

We present an empirical study of user activity in online BBC discussion forums, measured by the number of posts written by individual debaters and the average sentiment of these posts. Nearly 2.5 million posts from over 18 thousand users were investigated. Scale free distibutions were observed for activity in individual discussion threads as well as for overall activity. The number of unique us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011